Skip to content

Added i18n component and related scripts #1082

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 81 commits into
base: master
Choose a base branch
from

Conversation

yihao03
Copy link
Member

@yihao03 yihao03 commented Mar 28, 2025

run yarn trans to start translation

yihao03 and others added 27 commits March 12, 2025 22:12
…toring segment handling to use arrays for better management
… merging logic and remove unnecessary debug logs
…ded try catch when parsing xml to json to report failed parsing possibly attributed to unsound xml structure.
- Created a new XML file for references (97references97.xml) containing a comprehensive list of references used in the SICP JS project.
- Added a new XML file for the index preface (98indexpreface98.xml) to provide context and formatting for the index section.
- Introduced a new XML file for the making section (99making99.xml) detailing the background, interactive features, and development history of the SICP JS project.
- Updated subsection2.xml to close the previously open SUBSECTION tag and include a comment for clarity.
@yihao03
Copy link
Member Author

yihao03 commented Apr 12, 2025

breaking changes are made: xml repositories are divided into folders, currently consisting of en and cn folders to store translated content. The same applies to the json folder after running "yarn json". Frontend needs to be changed accordingly to fetch json files from the corresponding url

@yihao03 yihao03 requested a review from arnav-goel10 May 2, 2025 05:02
baseURL: process.env.AI_BASEURL
});

const MAXLEN = Number(process.env.MAX_LEN) || 3000;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MAXLEN constant should be imported from the config.ts file instead of being redefined here. The config.ts already defines max_chunk_len for the same purpose with the same default value (3000).

Having configuration values defined in multiple places leads to maintenance issues and potential inconsistencies if one location is updated but not the other.

Suggestion: Replace with import { max_chunk_len } from "../config"; and use that value directly.

Comment on lines 105 to 107
console.log(
`Summary log saved to logs/translation-summary-${timestamp}.log`
);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log path is constructed differently in the code versus the message displayed to users:

// Actual path construction:
const logPath = path.join(
    logDir,
    `${translationSummaryPrefix}-${timestamp}.log`
);

// But message shows hardcoded format:
console.log(
    `Summary log saved to logs/translation-summary-${timestamp}.log`
);

import { Readable } from "stream";
import { ignoredTags, max_chunk_len } from "../config";
import fs, { PathLike } from "fs";
import { FileLike } from "openai/uploads.mjs";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The import import { FileLike } from "openai/uploads.mjs"; appears to be unused in this file. There are no references to the FileLike type throughout the code.

import fs, { PathLike } from "fs";
import { FileLike } from "openai/uploads.mjs";

const MAXLEN = max_chunk_len;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code creates an unnecessary alias for max_chunk_len. Additionally, throughout the code there are instances of Number(MAXLEN) which is redundant since max_chunk_len should already be a Number type in the config file.

import fs from "fs";
import OpenAI from "openai";
import path, { dirname } from "path";
import Stream from "stream";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Stream module is imported but never used.

@coder114514 coder114514 requested a review from arnav-goel10 June 5, 2025 14:03
@coder114514 coder114514 self-assigned this Jun 5, 2025
Copy link

@arnav-goel10 arnav-goel10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@RichDom2185 RichDom2185 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

Comment on lines +33 to +37
- name: Clone translated_xmls
run: |
git clone -b translated_xmls https://github.com/source-academy/sicp.git translated_xmls
mv translated_xmls/* xml/
rm -r translated_xmls
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we cloning from a specific branch instead of merging it to the default branch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain what these new workflows are for?

Copy link

@coder114514 coder114514 Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow is for triggering translations for all those English XMLs that are changed. The workflow is not set to be triggered on pushes yet because the AI translation is not very stable yet.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the translate-everything workflow is for translating all English XMLs

Comment on lines +38 to +39
echo API_KEY=${{ secrets.OPENAI_KEY }} >> .env
# echo API_KEY=${{ secrets.OPENAI_KEY2 }} >> .env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are no longer used, the secret should be deleted.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the start of the project, me and Yi Hao were both given an API key, so there are two secrets. Now one is used as a backup.

Comment on lines +29 to +30
echo API_KEY=${{ secrets.OPENAI_KEY }} >> .env
# echo API_KEY=${{ secrets.OPENAI_KEY2 }} >> .env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto.

@coder114514
Copy link

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

@RichDom2185
Copy link
Member

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

@coder114514
Copy link

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

They are the same. The translated_xmls branch is separately created because in the workflow for translation (translate-changed, translate-everything) an action called deploy which pushes files to a branch is used to push AI generated translations to GitHub, so the branch is created for this.

@RichDom2185
Copy link
Member

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

They are the same. The translated_xmls branch is separately created because in the workflow for translation (translate-changed, translate-everything) an action called deploy which pushes files to a branch is used to push AI generated translations to GitHub, so the branch is created for this.

I see, noted, and what is the rationale of running it in a workflow, as opposed to running it locally?

@coder114514
Copy link

Did not look at the contents of the zh_CN folder. I assume they are generated by AI? Or was there manual editing? If the former, please add a gitattributes file so that reviewers (and GitHub) knows it's machine generated

All the AI translated XMLs will go into the translated_xmls branch, so I will delete the zh_CN folder. Should I add gitattributes file to the translated_xmls branch to mark they are generated by AI?

How is the zh_CN folder different from the translated_xmls branch? Why not just merge the translations directly to master instead of using a separate branch?

They are the same. The translated_xmls branch is separately created because in the workflow for translation (translate-changed, translate-everything) an action called deploy which pushes files to a branch is used to push AI generated translations to GitHub, so the branch is created for this.

I see, noted, and what is the rationale of running it in a workflow, as opposed to running it locally?

sicp.sourceacademy.org is a static site, so the translated XMLs need to be deployed too, and thus we are storing them on GitHub.

This is not really about the translation workflows. The translator can be run locally and then the generated content can be manually pushed to the translated_xmls branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants